Compiler Optimization for Superscalar Systems: Global Instruction Scheduling without Copies

نویسندگان

Philip H. Sweany

Steve Carr

Brett L. Huber

چکیده

Vol. 10 No. 1 1998 Many of today’s computer applications require computation power not easily achieved by computer architectures that provide little or no parallelism. A promising alternative is the parallel architecture, more specifically, the instruction-level parallel (ILP) architecture, which increases computation during each machine cycle. ILP computers allow parallel computation of the lowest level machine operations within a single instruction cycle, including such operations as memory loads and stores, integer additions, and floating-point multiplications. ILP architectures, like conventional architectures, contain multiple functional units and pipelined functional units; but, they have a single program counter and operate on a single instruction stream. Compaq Computer Corporation’s AlphaServer system, based on the Alpha 21164 microprocessor, is an example of an ILP machine. To effectively use parallel hardware and obtain performance advantages, compiler programs must identify the appropriate level of parallelism. For ILP architectures, the compiler must order the single instruction stream such that multiple, low-level operations execute simultaneously whenever possible. This ordering by the compiler of machine operations to effectively use an ILP architecture’s increased parallelism is called instruction scheduling. It is an optimization not usually found in compilers for non-ILP architectures. Instruction scheduling is classified as local if it considers code only within a basic block and global if it schedules code across multiple basic blocks. A disadvantage to local instruction scheduling is its inability to consider context from surrounding blocks. While local scheduling can find parallelism within a basic block, it can do nothing to exploit parallelism between basic blocks. Generally, global scheduling is preferred because it can take advantage of added program parallelism available when the compiler is allowed to move code across basic block boundaries. Tjaden and Flynn, for example, found parallelism within a basic block quite limited. Using a test suite of scientific programs, they measured an average parallelism of 1.8 within basic blocks. In similar experiments on scientific proCompiler Optimization for Superscalar Systems: Global Instruction Scheduling without Copies Philip H. Sweany Steven M. Carr Brett L. Huber

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Proole Information to Assist Advanced Compiler Optimization and Scheduling

Compilers for superscalar and VLIW processors must expose suucient instruction-level parallelism in order to achieve high performance. Compile-time code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism alon...

متن کامل

Using Profile Information to Assist Advaced Compiler Optimization and Scheduling

Compilers for superscalar and VLIW processors must expose su cient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instructionlevel parallelism along ...

متن کامل

User Input User Input User Input Probed Program Probed Executable Profile Information Probing Library

Instruction schedulers for superscalar and VLIW processors must expose suucient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level paralleli...

متن کامل

Support for Speculative Execution in High- Performance Processors

Superscalar and superpipelining techniques increase the overlap between the instructions in a pipelined processor, and thus these techniques have the potential to improve processor performance by decreasing the average number of cycles between the execution of adjacent instructions. Yet, to obtain this potential performance benefit, an instruction scheduler for this high-performance processor m...

متن کامل

An Optimal Instruction Scheduler for Superscalar Processor

Performance in superscalar processing strongly depends on the compiler’s ability to generate codes that can be executed by hardware in an optimal or near optimal order. Generating optimal code is an NP-complete problem. However, there is a need for highly optimized code, such as in superscalar or real-time systems. In this paper, an instruction scheduling scheme for optimizing a program trace i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Digital Technical Journal

دوره 10 شماره

صفحات -

تاریخ انتشار 1998

Compiler Optimization for Superscalar Systems: Global Instruction Scheduling without Copies

نویسندگان

چکیده

منابع مشابه

Using Proole Information to Assist Advanced Compiler Optimization and Scheduling

Using Profile Information to Assist Advaced Compiler Optimization and Scheduling

User Input User Input User Input Probed Program Probed Executable Profile Information Probing Library

Support for Speculative Execution in High- Performance Processors

An Optimal Instruction Scheduler for Superscalar Processor

عنوان ژورنال:

اشتراک گذاری